Skip to content

Warn and exit non-zero on Podman/Infisical secret drift#37

Merged
jdoss merged 1 commit intomasterfrom
fix/drift-detection
Apr 23, 2026
Merged

Warn and exit non-zero on Podman/Infisical secret drift#37
jdoss merged 1 commit intomasterfrom
fix/drift-detection

Conversation

@jdoss
Copy link
Copy Markdown
Contributor

@jdoss jdoss commented Apr 23, 2026

Summary

  • psi setup now detects when <workload>--* Podman secrets exist that
    aren't in the current Infisical fetch. Each stale name is logged as a
    WARNING with a one-line remediation pointer, and run_setup raises
    DriftDetectedError at the end so the systemd unit exits non-zero.
  • psi setup --dry-run gains a "Workload drift" section that diffs each
    workload's drop-in Secret= targets against its <workload>--* Podman
    secrets, reporting stale Podman secrets and dangling drop-in refs per
    workload.

Why

_register_secrets only deletes-then-recreates the names it's given, so
any Podman secret that falls out of the fetch persists. It still resolves
via the shell driver, but _generate_drop_in omits it, so containers boot
without the env var — silently. This bit us when secrets moved into an
Infisical subfolder and recursive: true was not set on the source:
drop-ins regenerated without those keys, stale Podman secrets kept
resolving, and the failure surfaced as an unrelated port collision 65
minutes into a reboot.

Recursion stays opt-in; this PR just makes the drift loud instead of
changing defaults.

Test plan

  • uv run pytest -q — 366 passed
  • uv run ruff check psi/ tests/
  • uv run ruff format --check psi/ tests/
  • uv run ty check
  • Deploy and verify: trigger drift by adding a secret to Infisical at
    a path not covered by any source, run psi setup, confirm WARNING in
    the journal and non-zero exit.
  • psi setup --dry-run against the homelab config — confirm the new
    "Workload drift" section lists the known stale windmill-*--MODE /
    --NUM_WORKERS / --WORKER_GROUP entries.

`_register_secrets` only deletes-then-recreates the names it is given,
so any `<workload>--*` Podman secret that falls out of the fetch
persists. It still resolves via the shell driver, but `_generate_drop_in`
only writes `Secret=` lines for keys in the current fetch, so containers
boot without the matching env var. This failed silently when a workload's
secrets moved into an Infisical subfolder and `recursive: true` was not
set — the drop-in regenerated without those keys, the stale Podman
secrets stayed functional, and nobody noticed until a container broke.

Fix the silence:

- Between `_register_secrets` and `_generate_drop_in`, compare the
  `<workload>--*` namespace against the fetched set and log a WARNING
  per stale name with a one-line remediation pointer.
- Accumulate drift across workloads; `run_setup` raises
  `DriftDetectedError` at the end so the setup systemd unit (and
  `psi cache refresh`) exit non-zero.
- Extend `psi setup --dry-run` to diff each workload's drop-in `Secret=`
  targets against its `<workload>--*` Podman secrets and report both
  directions per workload.
@jdoss jdoss merged commit f9e9d6e into master Apr 23, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant